Linkage Disequilibrium Score Regression
   HOME

TheInfoList



OR:

In
statistical genetics Statistical genetics is a scientific field concerned with the development and application of statistical methods for drawing inferences from genetic data. The term is most commonly used in the context of human genetics. Research in statistical gen ...
, linkage disequilibrium score regression (LDSR or LDSC) is a technique that aims to quantify the separate contributions of
polygenic A polygene is a member of a group of non-epistatic genes that interact additively to influence a phenotypic trait, thus contributing to multiple-gene inheritance (polygenic inheritance, multigenic inheritance, quantitative inheritance), a type of ...
effects and various
confounding In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
factors, such as
population stratification Population structure (also called genetic structure and population stratification) is the presence of a systematic difference in allele frequencies between subpopulations. In a randomly mating (or ''panmictic'') population, allele frequencies are ...
, based on
summary statistics In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in * a measure of ...
from
genome-wide association studies In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any varia ...
(GWASs). The approach involves using
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
to examine the relationship between
linkage disequilibrium In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is h ...
scores and the
test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specif ...
s of the
single-nucleotide polymorphism In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently lar ...
s (SNPs) from the GWAS. Here, the "linkage disequilibrium score" for a SNP "is the sum of LD ''r2'' measured with all other SNPs". LDSC can be used to produce SNP-based
heritability Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of ''variation'' in a phenotypic trait in a population that is due to genetic variation between individuals in that population. The concept of h ...
estimates, to partition this heritability into separate categories, and to calculate
genetic correlation In multivariate quantitative genetics, a genetic correlation (denoted r_g or r_a) is the proportion of variance that two traits share due to genetic causes, the correlation between the genetic influences on a trait and the genetic influences on a di ...
s between separate
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
s. Because the LDSC approach relies only on summary statistics from an entire GWAS, it can be used efficiently even with very large sample sizes. In LDSC, genetic correlations are calculated based on the deviation between
chi-square statistic Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., ...
s and what would be expected assuming the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
.


Extensions

LDSC can also be applied across traits to estimate genetic correlations. This extension of LDSC, known as cross-trait LD score regression, has the advantage of not being biased if used on overlapping samples. Another extension of LDSC, known as stratified LD score regression (abbreviated SLDSR), aims to partition heritability by functional annotation by taking into account
genetic linkage Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separ ...
between markers.


References

Regression analysis Genetic algorithms Statistical genetics {{Genetics-stub